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Abstract 

We conduct an asymptotic risk analysis of the nonlocal means image denois- 
ing algorithm for the Horizon class of images that are piecewise constant with 
a sharp edge discontinuity. We prove that the mean square risk of an opti- 
mally tuned nonlocal means algorithm decays according to rT^ log^''^"^^ n, for 
an n-pixcl image with e > 0. This decay rate is an improvement over some 
of the predecessors of this algorithm, including the linear convolution filter, 
median filter, and the SUSAN filter, each of which provides a rate of only 
rr'^l^ . It is also within a logarithmic factor from optimally tuned wavelet 
thresholding. However, it is still substantially lower than the the optimal 
minimax rate of n~^^^. 
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1. Introduction 



1.1. Image denoising 

The long history of image denoising is testimony to its central impor- 
tance in image processing. A wide range of algorithms have been developed, 
ranging from simple linear convolution and median filtering to total variation 
denoising [T] and sparsity exploiting algorithms such as wavelet shrinkage |2J . 
Due to the sensitivity of human visual system to edges, the ability to preserve 
sharp edges is an important criterion for noise removal algorithms. Therefore 
Korostelev and Tsybakov proposed a framework to characterize the perfor- 
mance of image denoisers on edges [3]. Based on this framework, we aim to 
characterize the performance of several denoising algorithms that represent 
the current state of the art image enhancement techniques. In particular, we 
will focus on the popular and powerful nonlocal means (NLM) algorithm. 

1.2. The minimax framework 

In this paper, we are interested in estimating a function / : [0, 1]^ — )■ R 
from noisy pixel level observations. Define Pixel(z, j) = [-, — ) x [i, — ), 
and let Xij = Ave(/ | Pixel(i, j)) be the pixel level averages of /. We observe 
the samples 

where, Zij is iid A^(0, a^). The goal is to recover the original pixel values Xij 
from the observations yij, based on some information about the function /. 
For a given function / and an estimator / we define the risk function as 



(1) 



The risk can also be written as 
R 



(2) 

where the first and second terms correspond to the bias and variance of the 
estimator /, respectively. 

Let / belong to a class of functions J-", e.g., a class of edge-like images 
that represent edges with different shapes and orientations. The risk defined 



2 



in ([T]) depends on the specific choice of /. We define the risk of an estimator 
/ on the class T as the risk of the worst-case signal, i.e., 

K(^,/) = supi?„(/,/). 

The minimax risk over functions in T is then defined as the risk of the best 
possible estimator, i.e., 

K(F) = infsupi?„(/,/). 

The minimax risk is a lower bound for the performance of all measurable 
estimators for signals in J-". 

In this paper we are interested in the asymptotic setting where the number 
of pixels n — )• oo. For all of the estimators we consider, /) — )■ as 

n — )■ oo. Therefore, we consider the decay rate of the risk as the performance 
measure. We will derive the minimax risk for several popular image denoising 
techniques below. 

We will use the following asymptotic notation in this paper. 

Definition 1. f{n) = 0{g{n)) as n ^ oo, if and only if there exist uq and 
c such that for any n > hq, \f{n)\ < c\g{n)\. Likewise, f{n) = Q{g{n)) as 
n — 7- oo, if and only if there exist Uq and c such that for any n > uq, \f{n)\ > 
c\g{n)\. Finally, f{n) = Q{gin)), if fin) = 0{g{n)) and f{n) = il{g{n)). 
We may interchangeably use f{n) x g[n) for f{n) = <d{g{n)). 

Definition 2. f(n) = o[g{n)) if and only z/lim„^oo = 0. 

1.3. Horizon edge model 

Several different image edge models have been developed in the image 
processing and denoising literature. Here we will use the Horizon model 
that contains piecewise constant images with edges that are smooth in the 
direction of the edge contour but discontinuous in the direction orthogonal 
to the edge contour [21 H]. Specifically, let H older" (C) be the class of Holder 
smooth functions on R, defined as follows: h G Hdlder°'{C) if and only if 

<c\ti-tX~\ 

where k = [a\ . Given a one-dimensional smooth edge contour function h, 
we define the image fh : [0,1]^ R as fh{ti,t2) = l{t^<hiti)}, where is 
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h(t,) 



Figure 1: An example of a Horizon function, a piecewise constant image containing an 
edge that is Holder"' smooth in the direction of the edge contour but discontinuous in the 
direction orthogonal to the edge contour. 



the indicator function. Based on this construction, we define the Horizon 
class of functions as 

H^[C) = {A(ti,t2) : h G Hdlder"{C)r]Hdlder\l)}, (3) 

where a is the smoothness of the edge contour. Figure [TJplots a representative 
function from this class. 

The following theorem, proved in p], specifies the minimax risk of the 
class of all measurable estimators on H"{C). 

Theorem 1. [3J For a > 1, the minimax risk of the class H°'{C) is 

K{H-{C))^n^K (4) 

We are particularly interested in the case a = 2 edges, for which the 
optimal rate is n~^/^. The rate provided in the above theorem is the Holy 
Grail of image denoising algorithms. 

1.4- A menagerie of denoising algorithms 

We will perform a minimax risk analysis of not just nonlocal means but 
a number of other popular image denoising algorithms. 
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Linear filtering 

The classical denoising method is the linear convolution filter, which es- 
timates the image via 

fg^ih j) = ^Y1 9m,eyi-m,j-e, (5) 

m e 

where 5^ is a two dimensional filter impulse response that satisfies J2i J2j 9i,j = 
l|^ When all the weights gij are equal, the algorithm is called the running 
average or the box filter. Most of the linear filters used in practice are sym- 
metrical and approximately isotropic. 

Definition 3. Let g be a real and symmetric filter response, i. e., g^ j = g^ij = 
gi-j, and let G{ui,U2) represent its two-dimensional Fourier transform. The 
filter is isotropic if and only if there exists a function F : R — > C, such that 

G{(jJl,UJ2) = F i^^l + 1^2^ V — TT < (jJi, UJ2 < TT. 

Isotropic filters are popular, because they treat image features similarly 
regardless of their directions. Let grad(-) be the gradient operator. The 
following theorem, proved in Section 4. If provides the decay rate of the risk 
of linear convolution. 

Theorem 2. Consider the linear convolution filter (|5| and suppose 
that g is real, symmetric, and isotropic. Furthermore, assume that 
||grad(G'(wi, W2))||2 < C for a fixed constant C . Then, 

inf sup Rn{fjg^)-n-h 

9 f(^H'='{C) 

Castro and Donoho [5j have proved a similar result for the special case of 
the box filter. While the Horizon model used in [5] is slightly different from 
our model, their proof works in our setting as well. 

1.4.2. Yaroslavsky / SUSAN filter 

While linear filters are popular in image processing due to their simplicity, 
they unfortunately blur images with sharp edges. One popular alternative is 



*For simplicity of analysis, we use a periodic extension of y at the image boundaries. 
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to adapt the weight of each pixel in the average ([s]) according to the distance 
between its noisy value and the value of the pixel we aim to estimate. Let 
Cf^j- 4 {(m,£) \i-An<m<i + An,j-An<i<j + A„} denote the 
A„-neighborhood of the pixel One popular approach for setting the 

weights is 

y (ym.,e-yi,j)'^ 

Wij{m,e) = e ^ , 
from which we calculate the estimate 

l^m=i-A„ l^e=j-A„ '^ijV'h- 



Only the pixels in the A„-neighborhood of contribute to the estimate 
of that pixel. This algorithm is called the Yaroslavsky Filter (YF) or SUSAN 
filter [SllT]; slight modifications are known as the bilateral filter [H] and a-filter 

To calculate the risk of the YF, we consider a slightly different, oracle- 
based algorithm. Suppose that in setting the weights wjj (m, i) of the YF we 
have access to the actual (and not the noisy) value of the pixel Using 
this oracle information we can set the weights according to 



w^ j (m, i) = e 2^ 



Intuitively, the oracle weights are "less noisy" than the actual filter weights. 
Plugging these weights into (|6]) we obtain what we call the semi-oracle 



Yaroslavsky filter (SYF). The following theorem, proved in Section 4.2, shows 
that, as far as the decay rate is concerned, the SYF's performance is the same 
as the linear filter and box filter. 

Theorem 3. The risk of SYF algorithm satisfies 

inf sup i?„(/,/^^) = r](n-2/3). 

1.4-3. Sparsity based denoising 

Another popular class of image denoising methods exploit sparsity in 
some transform domain via thresholding. Wavelets are often used as the 
sparsity domain for natural images. Let W(?/) represent the separable two- 
dimensional wavelet transform of the image, let XW represent the inverse 
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wavelet transform, and let T be the hard thresholding function, i.e., Te{x) = 
xl{\x\>e}- Then wavelet thresholding denoising corresponds to 

= iw{remy)))- 

Donoho and Johnstone have proven that supj^jja(^c) ^n{f, f^) = ^{n~^) [1], 
|10j . Even though this rate is an improvement over the above algorithms, is 

4 

still far from the optimal achievable rate of n~3 for a = 2. 

This suboptimality spurred the development of other sparsity-inducing 
transformations, including curvelets [H], wedgelets shearlets [12], and 
contourlets [I3] . Among these transforms, wedgelet denoising provably achieves 
the optimal rate of n~3 for a = 2 However, wedgelet denoising performs 
poorly on textures, which has limited its application in practice to date. 



2. Nonlocal means denoising 

The YF estimator sets its weights according the noisy pixel values and 
their spatial vicinity; however neither of these two features are reliable for 
noisy, edgy images. In contrast, the nonlocal means (NLM) algorithm sets 
its weights according to the proximity of the image patch surrounding each 
noisy pixel with other patches in the image [H] . Define the (5n-neighborhood 
distance ds^{yij,ym,£) between two observations as 

where = (2(5„ + 1)^ — 1. Note that, in contrast to the definition in [Tl], we 
have removed the center element — |/n,pP from the summation. Since we 
assume that 5„ — )■ oo as n — )■ oo, the effect is neghgible on the asymptotic 
performance. But, as we will see in Section |4| removing the center element 
simplifies the calculations considerably. NLM uses the neighborhood dis- 
tances to estimate 

where S = {1,2, ...,n} x {1,2, and Wij{m,i) is set according to 

the ^^-neighborhood distance between yij and Vm/- For the simplicity of 
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notation, in cases where both the reference pixel {i,j) and the algorithm 
are obvious from the context, we will omit the superscript and subscript of 
the weight and use the simplified notation Wm/ instead of w^j{m,i). It is 
straightforward to verify that Fj{d'j^{yij,ym/)) = d'g^{xij,Xm,£) + 2cr^, which 
suggests the following strategy for setting the weights: 



where tn is the threshold parameter. Soft/tapered versions of setting the 
weights have been explored and are often used in practice [I^. However, 
the above untapered weights capture the essesnse of the algorithm while 
simplifying the analysis. We postpone the discussion of tapered weights 
until Section |5l 

There are two main differences between the NLM and YF algorithms. 
First, the pixels that contribute in the NLM averaging are not necessarily in 
the local neighborhood of the reference pixel (hence the monicker "nonlocal" ). 
Second, the NLM weights depend not on the difference between the pixel 
values but on distance between the pixel neighborhoods. In other words the 
pixel neighborhood is even more important than the pixel value. 

To derive a lower bound for the risk of NLM, we will analyze two algo- 
rithms that set the weights using some degree of oracle information regard- 
ing the true value of the signal. The full oracle NLM (FNLM) has access to 
'E{d'g^{yij,ym,e)) in setting the weights Wm,i in ([t]) and thus sets them using 
the noise-free values of the pixels 



The semi-oracle NLM (SNLM) differs only slightly from the standard NLM 
in that it uses the semi-oracle neighborhood distance 







.171= — 5n i=—S, 



•^i+i,j+m yn+i,p+m 




(10) 



and then sets the weights in (uh according to 




(11) 
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Unlike FNLM, SNLM assumes that just one-half of the noise is removed from 
the distance estimates. Therefore, the distances calculated in the SNLM are 
more accurate than the standard NLM but less accurate than in the FNLM. 
In the rest of the paper, we will use f^, f^, and to denote the NLM, 
SNLM, and FNLM estimators, respectively. 



3. Main Results 



Our first result, proved in Section [43] establishes an upper bound on the 
risk of NLM. 

Theorem 4. Fix e > and consider NLM denoising with 5n = 2 log^"*"^ n 
and tn = ^'^e . The risk of this algorithm over the class H'^(C) is 

log2 n 

sup fl(/,/-)=of'??^). (12) 

Before we discuss the implications of this theorem, it is important to 
note that, while we can improve the decay rate as close as we desire to 
0{n~Hog2 n), the constants that are involved in the big-0 notation grow as 
e decreases. Therefore, in practice very small values of e are not desirable. 
Comparing the upper bound (12) with the optimal minimax risk ^ in- 



dicates that NLM is suboptimal for a > 1. In other words, NLM cannnot 
exploit the smoothness of edge contours in images. 

The bound in Theorem |4] is for a specific choice of parameters, and it is 
natural to ask whether NLM can achieve the optimal rate with some other 
choice of parameters. To answer this question, we consider SNLM, which 
outperforms standard NLM in general. We make the following mild assump- 
tions: 

Al: The window size (5„, — )■ oo as n — oo. This assumption is critical to 
ensuring good performance of any NLM estimator. 



A2: The threshold is set to cr^ + 1„ as explained in (11) with t„ > 0. This 
ensures that if the neighborhood of pixel (m, i) is exactly the same as 
the neighborhood of pixel then Wm,e = 1 with high probability. 

A3: The threshold t„ is set such that, if the noise-free neighborhoods are 
different in more than half of their pixels, i.e., if d'^{xij, Xm,e) > ^, then 

F{w[j{mJ) = 1) = o{n-^). 
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A4: 5n = O(n^), for some /3 < 0.3. 

The following theorem provides a lower bound on the performance of 
SNLM. 

Theorem 5. Suppose that 6n and tn satisfy A1-A4- The risk of the SNLM 
over the class H°'{C) is 

inf sup i^(/,/^) = ^](r^-l). 

This bound is still suboptimal compared to the n~^^^ minimax rate for 
a = 2. In the words of John Cornyn 111, the junior United States Senator 
for Texas, "The problem with a mini-deal is we have a maxi-problem" [15j . 

Remarkably, this lower bound is achieved on a very simple image on 
which NLM would be assumed to work very well: l{t2<o.5} (see Figure |2|. 
Here is what goes wrong. Consider the estimation of an "edge" pixel 
that satisfies j = \nh{^)]. Define the set J = {{m,i) \ i = [nh{^)\} as 
the set of pixels just below the edge. We will prove the probability that a 
pixel in J contributes to the NLM estimate {wij{m,i) = 1) is larger than 
Po, where po does not depend on n. This happens due to the low "signal to 
noise ratio" in the distance estimates. Hence 0(n) pixels of J will contribute 
to the NLM estimate. Since these pixels have Xm/ = 1, they introduce a 
large bias in the estimate. In fact, we show below that the bias, as defined in 
(|2]), will be larger than . Here npo corresponds to the pixels below 

the edge that pass the threshold. This shows that the bias is clearly 0(1). 
Since there are n edge pixels, the risk of the estimator over the entire image 
is fi(n-i). 

4. Proofs of the Main Theorems 

4.I. Proof of Theorem \^ 

The proof has two main steps. The first step is to prove that there exists 
a linear filter for which the supremum risk is upper bounded by 0(?7,~^/^). 
For this step we use Theorem 3.1 and 3.2 from ||5j, which establish the same 
upper bound for the box filter. The second and more challenging step is to 
prove that no other linear filter can improve on this decay rate. The rest of 
this section is dedicated to the proof of this fact. 
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Figure 2: The simple image Horizon l{t,<o.5} used for proving the various lower bounds. 



Consider the function fhiti,t2) for h(t) = | and suppose that n is even. 
This function is displayed in Figure |2j Let 



X{ki,k2) = -X^X^^^i. 



n 



i2 



represent the Discrete Fourier Transform (DFT) of a discrete two-dimensional 
discrete signal x. Since y = x + z, the DFT of f^^ equals 

F^^ih, h) = Y{h, h) ■ G{h, h) = X{h, h)G{h, h) + Z{k,, k2)G{k^, k2), 

where Z is again iid A^(0, a^). For fhih, ^2) with h{ti) = I, X{ki, ^2) satisfies 

f if fci ^ 0, 

X{k,,k2)=l (13) 

I. 1— e n 

It is easy to see that Rn{fJ^^) = ' Fg^Wl), where 4 

Sfci fc2 l^(^i'^2)P- If we define B{f) as the bias of the estimator /, then 
we have 



^ ' ... gj^2 7rfc2 



l<.k2<n, odd 

1 

> 



2 

^ • sin^^ 

l<k2<n^''-\ odd " 
11 



The variance of the estimator is 



Var(/^^^) = ^5^|G(A:i,fc,)|V. 



We know that 



\G{u,,U2)\^ + 0{n~^), 



(14) 



where G is the continuous Fourier transform of g and satisfies ||grad(G)||2 < 
C . Since g is isotropic, there exists F : R — )■ C such that 



Changing the variables of integration in ( 14 ) to polar coordinate radius ojr = 
^/ujf+~uj^ angle 6, we have 

|G'(a;i,a;2)|' > 27r / r\F{r)\^dr = 2n iU2\GiO,uj2)\^du2. (15) 

Jr=0 JuJ2=0 




Combining (14) and (15) we have 



Var(A^^) = ^2H \Gih,k2)\'a' = ^ ^^[^(O, A:2)|V - 0(^-1). 

ki,k2 k2 

Summing the lower bounds for the bias and variance of this estimator, 
we obtain the following lower bound for the risk of linear filtering: 

i?„(/,/^^) = B2(A^^)+Var(/f'^) 



> 



1 E |l-G(0,fc2)p-^ + ^E^2l^(0'^^)l'^'-O(-"') 

n k2 



l<k2<n^/^, odd 



1 2 A 2 

= -2 E |l-G(0,A:2)r^ + ^E^2|^(0'^2)|V-O(n-^). 

l<fc2<n2/3, odd ^ k2 

Minimizing the dominant term of the lower bound over the filter weights 
provides G*{0, ^2) = — ^3^2^.3 for odd values of /c2 and zero for even values 



1+— 
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of k2- To find a lower bound we calculate the bias term with these optimal 
weights: 

i2 



|2 



n 



> 



l<fc2<n2/3, odd 

^ ^ Vl + 47rV2A;3/n2 ) ^ 

l<k2<n'^/^, odd 

^ I l + 47rV2 ) 

l<fe2<n2/3, odd ^ / A 



l<fc2<n.2/3, odd 



1 + 47r%2 y V 40 



This completes the proof. 



Proof of Theorem^ 
In this section, denote the pixel to be estimated as Xjj. For clarity we 
use the notation Wm/ instead of wfj {m, £). We first characterize some of the 
properties of the SYF weights. 

Lemma 1. Suppose that Xij = 0. // Xm,e = ^ij, then Ei{wm,iym,£) = 0. 

-1 

Furthermore, if - XmA = I, then E^Wm/Vm/) > ^jr^e"^^^^- 

Proof. The first claim is clear from symmetry. To prove the second claim, 
we observe that E{wm,eym,e) = E{wm/Xm,e) + E{wm,eZm,e)- Since Xm,e = 
1, we calculate E{wm,e) and E{wm,eZm,e)- It is clear that E{wm,eZm,e) = 

2 2 

— 7= I Zjji fG ^ > 0. Therefore we calculate 

cry ZTT 'J ~oc ' 



Hwm,e) = — 7= / e 



1 /""^ -1^.1 



00 



/ „ 2<T2r2 V^m/- {^2+^2)2 i 



cT27r 



' —00 

1 ^ 1 

g 2(c7^+t2) / (j2q-2 2((t2+t'-^) 
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This completes the proof. 



□ 



Define the A-neighborhood of a pixel (m,£) as C,^^ = : \i — m\ < 

A,\j-i\<A}nS. 

Lemma 2. Let fi„ = (2A„ + 1)^. We then have 



P 



fir. 



\ 



\ 



SY 



Ew. 



SY 

ml 



> t 



< 2e" 



2 £~2 i 



The proof is a simple application of the Hoeffding inequality. 

Proof of Theorem \^ The first claim is that the optimal neighborhood size 
satisfies A„ = fi(logn). We prove this by contradiction. Suppose that A„ = 
0(log(n)) and consider the performance of the SYF on the image Xij = 
for every It is clear that the bias is zero. However, the variance is 

lower bounded by Q ^ ^ j . This is far from the optimal performance of the 

linear filters analyzed in Theorem |2} Therefore A„ = f2(log(n)). 

Now consider the example image shown in Figure [2] with //i(ti,t2) = 
l{t2<o.5}- For notational simplicity we assume n is even so that the value of 
each pixel is either or 1. Define the two regions Pi = : f < J < 

f + and P2 = : j > f + A„}. At least 1/4 of the pixels in the 

neighborhood of the pixels in Pi have the noise- free value of 1. All pixels in 
the neighborhood of the pixels in P2 have the noise-free pixel values equal to 
1. Over each region we will find a lower bound for the risk of SYF and then 
sum them to obtain a lower bound for the risk over the entire image. 

Case /- € Pi: From the Jensen inequality we have 

E X,, - -^^^ > E 1,'^ 

Define the following two constants: 

mo = E(wfJ'(m,£) I Xij = 0,Xm,e = 0), 



mi = E{wfJ{mJ) I Xij = 0,Xm,e = 1) 
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It is clear that mo > mi. Let the event A be 



A 



< A 



2-e 



(16) 



for some e > 0. We have 



(i) / Z](m/)eC^" WmlUm/ 



2-e 



\^ 4A>o + A2- 

> E - ^'^ 



A^ V{A) 



> 



\ 4A>o + A^- 



P(A=) 



4A2mo + A 



2-e 



A P(A) 



Inequahty (a) uses Lemma [2] and the fact that uiq > mi. Inequahty (6) uses 



Lemma 



and therefore Cq 



Since Cf/ has (2A„ + 1) 



pixels, at least A^ of them have the noise-free pixel value 1. 
Since A„ = f2(logn), Lemma |2] proves that P(A'^) = o(l) and, therefore, the 
bias is lower bounded by 0(1) for all of the pixels in Pi. 

Case II- ihj) € P2'- As mentioned before, all pixels in the neighborhood 
of the pixels in P2 have the noise- free pixel values equal to 1. Hence, we have 



E 



E 
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Defining the event A as in (16), we have 



E 



> E 



> E 



4A>o + A2- 
4A2mo + A 



2-e 
n 




4A^E(w^^^2:n,^£)2 
(4moA2 + A2- 



If the neighborhood size is larger than clog(r;,) for some constant c, then 
Lemma [2] will imply that P(A'^) < o (^). Therefore, the dominant term in 
the above expression of the form of Combining the lower bounds for Pi 

and P2, we obtain a lower bound of the form of + Optimizing over 
An proves that 

inf i?„,(/,/^^)>fi(n-^/3). 



This completes the proof. 



□ 



It is clear from the proof above that the neighborhood size is the main 
parameter that controls the decay rate of the risk of the YF. The Gaussian 
term in the YF weights enables an improvement in the constants but does 
not play any role in the decay rate. In the extreme case of A„ = n, when all 
of the image pixels can potentially contribute to the estimation of a pixel, 
the decay rate of YF degrades to 0(1). This algorithm is called the range 
filter, and |8] observed in practice that it performs much worse than even 
linear filters, as the above analysis confirms. Interestingly, NLM addresses 
this issue and therefore its search space could be the entire image. This is 
the main reason for its improved performance. 

The lower bound proved in Theorem |3] is the same as the upper bound 
we derived for the performance of linear filtering. Therefore, we have the 
following theorem. 

Theorem 6. The risk of the SYF satisfies 



inf sup RnifJ'"^] 
^^''^ f(^H°'{C) 



n 



-2/3 
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4 ■ 3. Proof of Theorem 

The proof has two main steps. First, we show that the risk of the pixels 
far from the edge is 0(log^'^'^'^{n)/n^). Second, we show that the risk of the 
pixels whose Sn neighborhood senses the edge is constant; however there are 
at most 0{nSn) of these pixels. The following two lemmas will play key roles 
in our analysis. 



Lemma 3. Let Z ~ N{0,a^). For A < we have 



2<72 

1 



V1-2A(t2' 

Proof. The proof is a simple integral calculation: 



aV2n J^^ ^ i_2\ 



□ 



Lemma 4. Let Zi, Z2, . . . , Zn be iid A^(0, 1) random variables. The Xn '^^^^ 
dom variable defined as X^iLi ^1 concentrates around its mean with high prob- 
ability, i.e., 



< g-t(i+ln(l-t))_ 



Proof. Here we prove just the first claim; the proof of the second claim follows 
along very similar lines. From Markov's Inequality, we have 



P 



-xt-x 
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Figure 3: An example of a Horizon image. The J„-neighborhood of pixel {ia,ja) G '5'4 does 
not intersect the edge contour, while the (5„-neighborhood of pixel {ib,jb) G intersects 
with the edge contour. 



The last inequality follows from Lemma |3l The upper bound proved above 
holds for any A < ^. To obtain the lowest upper bound we minimize — n 

over A. The optimal value of A is A* = argmiuA n = 2{t+i) • Plugging 



A* into (17) proves the result. □ 



Proof of Theorem We will consider the following partition of the image 
pixels. Let S = {1, 2, . . . , n} x {1,2,..., n}. For a given Horizon function 
define = | ^ > hi^) + S, = | /i(^) < ^ < 

hii) + fh Ss = {{^,J) I M^) -t<i< and S, = {{^,J) \ i < 

h(l)-^}. These regions are displayed in Figure 3 The (^^-neighborhood of 
the pixels in Si and 5*4 do not intersect the edge, while the ^^-neighborhood 
of the other pixels may have pixels from both sides of the edge. See Figure 
jsj For the notational simplicity we write j)e5f the double summation 
over i,j where j satisfies the constraints specified for 5*^. 

Consider a pixel G 5*1. The risk of NLM at this pixel is 



ml 
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where Xij = 0, since G Si. Define the set of oracle weights 



w: 



Define U — , — 
SiU S4}. We then have 



1 if^>h(^), 

n ^ n ' ^ 

otherwise. 



and let the event A 



(18) 

{wm,e = w* I, V(m, ^) e 



E(f/) = E(f/ I A)V{A) + Wj{U I A^)P(A") < E(f/ I A)P(A) + P(A^), (19) 
where the last inequality is due to the fact that the risk of the estimator is 



bounded by 1 . We now calculate each term of ( 19 ) separately. 
Define 5*14 



5*1 U 5*4 and 5*23 = 6*2 U 5*3. Then we have 
E(?7 I A)V{A) 



E 



< E 



< E 



./Urn/ + J2{m,£)eS23 '^m/Vm/ 



A F{A) 



E 



+ E(m/)e523 '^"^'^ 



E(m/)65i4 '^m/^rri/ + E(m/)e523 ^m/Xm/ 



+ E 



E(m/)e5i4 "^m/ + E(m/)e523 '^'"'^ 
E(m,^)65i4 '^m,e^rn,e + E(m/)e523 '^m/Zm/ 



E 



{m,^)eSi4 "^m.^ 



+ E 



{m/)6523 



+ 2 



\ 



E 



E(m,^)e5i4 ^m,e^rn,e + E{m,^)e523 '^m/Xm/ 



E 



(m,^)eSi4 "^"1,^ 



+ E 



{m,^)6523 



X 



E 



E(m/)e5iU54 '^m,e^rn,e + E(m/)e523 '^m/Zm/ 



(20) 



E(m,^)eSi4 ""^m/ + E(m/)e523 "^"'^ 

The last inequality is due to Cauchy-Schwartz. In the next two lemmas we 



bound the last three terms of (20). 
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Lemma 5. Let Wm,e be the weights of NLM with 5n = \og^~^'^ n and tr, 



, ^ ^ for e > 0. Also, let he the oracle weights introduced in (18). 

Then 



E 




Proof. Define Sf as the set of indices of the pixels whose noise- free value is 
neither zero nor one. Since the images are chosen from the Horizon class, the 
cardinality of this set is at most 2n. Plugging in the values of Xm/, we have 



E 



^ / Z](m/)eSi4 '^m/^rn/ + Z](m/)e53 + ^{m,e)eSf '^mlXm/ 

^ ^ ■ E(m/)e5i4 ^m/^^m,^ + Y.{m,l)(iSs ^ + = ^ 

Z](m,^)eSi4 ""^m.^ + Z]{m,^)eS3 / V""-^ 

where Inequality (b) is due to the fact that the expression after Equality 
(a) is an increasing function of 'Yl,(mi)&s-i\Sf '^"^^ ^ decreasing function of 
Y.(m/)eS2\Sf^rni- Therefore, we set Wm,i = 1 for (m,£) G S3 and w^^e = 
for(m,£)e52. □ 

Lemma 6. Let Wm/ be the weights of NLM with 6n = log2"'"'^n and tn = 



, ^ , for e > 0. Also, let be the oracle weights introduced in (18). 
Then we have 

Proof. Since Yli{m 1)^823'^^'^ — ^ interested in the upper bound 
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of the risk, we can remove it from the denominator to obtain 



E 



< E 



m/)gSi4 m,£ 
2^ 



Since -^(!:-^)ggi4 "'-f --^ is the average of iid random variables, it is not hard to 

prove that E (^^iijiili^ii — ""-f = Q(iL\^ To bound the other two terms in 

(21) we use the notation defined in the last section: = j) :\i — m\< 
A, \i — ^1 < A} n S*. We also define E(- | C^j) as the conditional expectation 
given the variables in C^^. We then have 

S(m/)6523 '^m/Zm/ 



E 



E E 



E 



E 



S{m,^)65i4 ^m,e 

S(m,£)65i4 ^m,£ 

^(Z](m ^^')g523 I^(m/)g523 ^^"i/^m/W w,^'^m',<?' | C^^) 
(m,^)eSi4 m/) 




X(m',^')6C^*'? E(m/)e523 ^(^™'^^"^.^^'^''^'^'"'.^')\ ^ ^ 

For the last inequality we have used the Cauchy-Schwartz Inequality to prove 
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that Ej^Wm/Zm/Wm' /' Zm' ,£') < 3(T^. Although we could derive a loose bound 



for E 



and still draw the same conclusion, we used the 



above technique since we have to use it in the proof of Theorem [7) The last 
term we have to bound in (21 ) is 



E 



E 



< 



E 



E 



S(m,^)eSi4 ^rn,e 



< O 



This proves the lemma. 



Using Lemma 5 and Lemma 6 in (20) proves that 



E(f/| A)P(v4) = O 



51 



□ 



(22) 



Finally, using Lemma |4] and the union bound it is easy to show that 



O 



(23) 



It is important to note that the constants of this probability are hidden in 
the O notation. These constants depend on e and increase as e decreases. 
Therefore, we cannot set e = 0. 



Plugging in (23) and (22) in (19) results in 



E 



o 



log' 



n 



V(z,j)G5i. 



Now consider G ^2 U S3. In this region we can bound the error by the 
worst possible risk, which is 1. We will discuss the sharpness of this bound 
in the next section where we develop a lower bound for the risk. 

Using the bounds provided above for the risks of the pixels in 5*1, 6*2, 5*3 
and 5*4, we can now calculate the final upper bound for the risk of the NLM 
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as 



l0g^+2^H(|Si| + |54|) , |52| + |53| 



< 



< O 



loe:2+'fn) 



n 



In order to derive the last inequality we noted that since h{ti) G Hdlder^{l) 
the cardinality of 5*2 and 5*3 are 0(nlog(n)). This completes the proof of 
Theorem HI □ 

4.4- Proof of Theorem^ 

Suppose that the parameters of SNLM satisfy assumptions A1-A4. To 
derive a lower bound we consider the performance of the SNLM algorithm 
on the simple image in Figure [2j For notational simplicity we assume that n 
is even, and hence all of the pixel values are either or 1. The proof follows 
four main steps: 

1. We consider the pixels that are just above the edge, i.e., (i, [f]), and 
prove that the risk of the NLM on these pixels is lower bounded by a 
constant that does not depend on n. 

2. Using asymptotic arguments we prove that the probability a pixel just 
below the edge passes the threshold t„ > is larger than po, where 
Po is a non-zero probability independent of n. Based on this, we use 
a concentration argument to prove that Q{n) of the pixels just below 
the edge will pass the threshold with high probability. See the formal 
statement in Theorem [U 

3. Using symmetry arguments we prove that the probability a pixel that 
is £ < (5„/2 row^ above the edge or below the edge passes the threshold 
is equal. This is formally stated in Lemma |8j 

4. Combining the outcomes of Steps 2 and 3 we show that the risk is 
minimized if all the pixels just above the edge pass the threshold and 
the probability that the other pixels pass the threshold is as low as 



'^The row of an image is the set of all pixels of the form (i, £). 
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possible. If more zero pixels above the edge pass the threshold, then 
more pixels with noise-free value 1 will also pass the threshold, and this 
makes the bias large. Therefore we assume that pn/, the probability 
that a pixel at distance i of the edge passes the threshold, is equal to 
zero for i > 1. However, we have already proven that for i = 1 the 
probability is larger than p^. Theorem [5] uses this fact to show that the 
risk of this estimator is larger than a constant independent of n. 

Proposition 1. Let j* = [|] . For any pixel with coordinates of the form 
{i*,j*), there exists a non-zero constant probability po such that for any 5„ 
and t„ 



Proof. For notational simplicity we use i = i* and j = j* in the proof. We 
have 



p 



p 



>p 



i,P / 

>n Pn I, Pn J 

\ Y.^'lp - - 1 E ^^'0 --^]' 

>n ^ p Pn I, Pn j 



where si^rn = Zm+e,j-i+p- According to the Berry-Esseen Central Limit The- 
orem for independent non-identically distributed random variables ^6], we 
know that 



P 



Pn g p Pn I, PnJ Pn 



where G is a Gaussian random variable with mean zero and bounded standard 
deviation. In fact, it is not difficult to confirm that 

^^^^-^^ + (25„ + 1)^ • 
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Since P(G < -1) > 2po {2po is P{G' < -1) where G' ~ N{0,2a^)) is 
non-zero, for large values of n we can ensure that C/n < po and therefore 
that F{dj^{yij,ym,j-i) < o"^ + tn) > Po- We now prove that even though 
the weights are correlated, 0(n) of the weights will be equal to 1 with very 
high probability. Define Ui as and define the process U = (ui, . . . , 

Break this sequence into 25„ subsequences Ui = {ui, Mi+2<5„, ^ii+45„, • • • , Mn-2<5„+i 
Each Ui has independent and identically distributed elements. Therefore, ac- 
cording to the Hoeffding Inequality, we have P(| J2u eu- > — 

2e " " . On the other hand we know that E{ui) > po. Therefore, 

Uj&i J 

Finally we use the union bound to obtain 

P -ripo < -t) < P 5^ 5^ - ^Po < -t 

< P ( U.{. : J: u, - ^p„ £ U 4*"<=-«- 

□ 

Define the set J = \ j = [jh{^)\}. It is clear that |J| = n. The 

following Corrollary to Proposition [T] shows that NLM sets the weights of 
most of the pixels in J to 1. 

Corollary 1. Consider the image displayed in Figure^ and let 5„ = O^n") 
for a < 1. For any 6n and tn > 0, Q{n) of the pixels in J will pass the 
threshold t^ with very high probability. 

Proof. Set t = in Proposition [l] □ 

Remarkably the above corollary holds in a very general setting even if the 
assumptions A1-A4 do not hold. In other words, NLM in its most general 
form is not able to distinguish between the pixels right above the edge from 
the pixels right below the edge. This is due to the fact that the "signal to 
noise ratio" in the ^^-neighborhood distance estimates is too low at the edge 
pixels. This is the result of the isotropic neighborhoods used to form the 
weight estimates. 
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Lemma 7. If \m — i*\ > 5n/2 and \m' — i*\ > (5„,/2, then 

for any i,m,m' . 

The proof of this lemma is obvious and is skipped here. 
Lemma 8. For £ < Sn/2, 

The proof of this lemma is also obvious from symmetry and is skipped 
here. We can now prove Theorem [5} which provides a lower bound for the 
risk of SNLM. 

Proof of Theorem We derive a lower bound for the risk of SNLM on the 
image displayed in Figure [2j To do so, we consider the pixels just above the 
edge and prove that the SNLM algorithm has risk 0(1) at these pixels. Since 
there are G(n) of these pixels, the risk over the entire image is larger than 

e(n-i). 

Consider a pixel (z*, j*) with j* = [f]. The risk of the SNLM is 
Note that Wm,i is independent of the Um/ according to the construction of the 



SNLM weights in ([TO]). Let Pn/ be the probability P(u;^,i = 1) for £ G {j* - 
^n, i* — + 1, • • • 5 j* + ^n}- Wc cau partition the row {{i,t)\ \ < i < n} into 
25n + 1 subsequences and apply Hoeffding inequality on each subsequence. 
We combine the results of different subsequences with the union bound to 
prove that 

P i^^Wm/ - npn/\ >t^ < ASne^. (25) 
Define the event A as 
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Using the union bound and (25) we have 



Any lower bound on the bias of the estimator leads to a lower bound on its 
risk. Therefore, we find a lower bound for the bias as follows: 



E 



> E 



> E 



E E 

E E Wm/Vm/ 



E npn,e + n-^^6„ 



EE 

E E '^m,e 
a] F{A) > E 



E E Wm/Vm/ 

E npn,e + n-^^Sn 



-P{A' 



where for the last inequality we have used the fact that the risk of SNLM 
is bounded by 1. Since from the construction of SNLM in (10), Wm,£ is 
independent of Zm/, we have 

E E Wm,tym/ 



E 



E npn/ + n^-^Hn 

Enpn,£ + n0-665„ 



P{A' 



E 



ml 



P{A') > 



E E Wm,ex 

E npn,i + n0-665„ 



n 



P{A^) 



Proposition [T] proves that both the numerator E£<j* 



2 Y.Kj" npn,i + n^-^H„ 

and the denome- 

nator 'Y^npn/ + n°'^^5„ are f2(n). Therefore, according to ^43, we can ignore 
the summations E^<j*-^a ^Pn/ and E£>j*+^a f^Pn/- By combining this fact 
with Lemma [HI we obtain 



EnPn,e + nO-'^^6n 



> 



Ef<j* nPn/ 



> 



n + 2Y^^^.,npn,i + nO-(i^6,^ 
In the last inequality we assumed that pnj 



-P{A^). 



PiA" 



for 



I2i<j* '"■Pn,e 



it is enough to note that 



1. To find a lower bound 
IS an 



increasing function of Yle<j* ^Pn,t and therefore is minimized if and only if 
TliKj* "^Pn/ takes its minimum value. However, according to Proposition [l] 
the minimum value of this term is Qin). Therefore, we have 



npo 



P{A' 



> 



npo + n + n-^^6„ 



-piA' 



Po 



Po + l 



(1 + 0(1)). 
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This completes the proof. □ 

5. Tapered NLM Weights 

In this section we show that the upper bound we provided for the NLM 
in Theorem |4] holds in the more general setting of tapered weights. We 
now allow the weights to be a smooth function of the 5n-neighborhood. We 
assume that the weight assignment policy satisfies the following properties: 

Bl: The neighborhood size 5„ = 21og(n). 

B2: The weights are non-negative and bounded, i.e., < Wm/ < «• 

B3: If (f{xij, Xm/) = 0, then the assigned weight satisfies E(wjj(m, £)) > c, 
for some constant c independent of n. 

B4: If (f{xij, Xm,i) = 1, then 'E{wij{m, i)) = O (^'^^- It shall be empha- 
sized that slower decay rate in this expectation, results in slower decay 
in the rate of the NLM algorithm. 

Theorem 7. // the weight assignment policy in NLM satisfies properties 
B1-B4, then 



sup R{fj^)=0 



log 72 



n 



Proof. Consider the four partitions S'i-S'4 defined in the proof of Theorem 
|4j Our goal is to obtain an upper bound for the risk of the pixels in each 
region. The risk of the pixels in S2 and 5*3 will be bounded by the strategy 
we employed in Theorem |4j Here, we just explain how we bound the risk of 
the pixels in Si and 6*4. Since the proof for 5*4 is the same as the proof for 
S'l, we consider just Si. Let (z, j) G Si. Therefore Xjj = and 

2 



2 



E ^^'"'^ '"'^ +E 



V V Yj ^rn/ ) \ Yj Wml ) ) ' 

To obtain an upper bound for the risk, we will find upper bounds for the last 
three terms in (26). Lemmas [o] and 10 below summarize the upper bounds. 
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Lemma 9. Let Wm/ be the weights of NLM satisfying B1-B4- Then 



Proof. Define Sf as the set of tlie indices of the pixels whose noise-free value 
is neither nor 1, and plug in the actual values of Xm/ to obtain 



E 



< E 



(m,^)eS3 " 



To derive the last inequality we use the following facts, which are easy to 
check: 

1. -^^ -^^ — IS an increasing tunc- 

\ 2^{m,e)eSiUS4''""i,t+l^(rn,t)eS2'JS3''""i,e+l^(m,e)eSf''"-rn,e J 

tion of J2i„,,e)eS3\Sf ^rn/- 

„ I ^(m,e)eSA''"rn,e + J2(rn,e)eS-i\Sf'^rn,e+J2(m,e)GSf^m,eXrn,e\ , ■ r 

2. -^^ -^^ — IS a decreasing tunc- 

\ 2^(m,e)eSiUS4'"'"i,t+l^(rn,t)eS2US3''"rn,e+l^(,n^f)^Sf ""^"1,1 J 

tion of J2{rn,e)eS2\Sf ^rn/- 

3- I'S'/I < 2n, i.e., Sf contains at most 2n pixels. 

Our next claim is that X](m ^)e54 ""^m/ ^"^^ X](m ^)e5i ""^m/ concentrate 
around their means. We establish this in a manner very similar to the proof 
of Theorem 5 We first break the Yli(m £)e54 '^rn/ into (4(5„ + 2)^ subsequences 
such that each subsequece contains only independent random variables. In 
other words if Xm,i is in one summation, then no other element of Cf^"'*'^ 
will be in the summation. Therefore, for each summation we can apply the 
Hoeffding inequality. Finally, we use the union bound as explained in the 
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proof of Theorem |5] to show that 



P 



P 



^ Wm,e - ^ E( 



Wm,e - ^ E( 



>tj < 2(45„ + 2)2e'''"+'''^^('->^)esi 
>t\ < 2(45„ + 2)2e(*'"+^''(2:(™,,)ss, «^)_ 



(m,^)e54 {m,^)G5i 

It is straightforward to prove that by setting t to 32a;nlog^'^(n), we have 



P 



P 



(m,^)e5i 



> 32anlog''^(n) \ <0[^ 



> 32an\og^-\n) 1 < O 



(28) 



Define the event F as | Y.{m,e)€Si ^rn/ - E(m/)e5i ^i^m/) < 32an log^'^ raj 

n {lE (m,^)eS4 ^"^.^ ~ S(m,^)e5i -"^("^m/) I <32ar;,log It is clear from 
(|28|) that 

P(^^) = O f §V (29) 



Using (pTh, (28), and (pi we have 



E 



< E 



< o 



E 



(m/)65iU54 



(m,£)eS3\Sf 



a 



F P(F) + P(F'=) 



n 



The last inequality is due to Assumptions B3 and B4. This completes the 
proof of the lemma. □ 
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Lemma 10. Let Wm.e be the weights of NLM with Sn = log(n). ^4/50 assume 
that the weights are set according to B1--BA. We then have 



E 



o 



Proof. We first condition on the event F introduced in the proof of Lemma 

El 



E 



ml 



< E 



< E 



< E 



< O 



ml 



logV) 



The last inequality is due to the fact that 



El 

(m,£)e5i4 



ct2 



c 



El E E 

^(m,£)G5i4 {m',e')eSi4 
^ ^ E{Wm,iZm,eWm',e'Zm',e' I Cfj) = 0(n^5^). 



Therefore, 



E 



(30) 

□ 
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Using the bounds derived in Lemmas |9] and 10 , we can complete the proof 
of the main theorem: 



sup / 



NL\ 



N\2 



< 



log2(n)(|^i| + |^4|) , 1^21 + 15; 



+ 



< O 



\og{n] 



n 



□ 



6. Discussion 

We have provided the first asymptotic result on the risk analysis of the 
nonlocal means (NLM) algorithm on smooth images with sharp edges. In 
contrast to most other filtering approaches, NLM does not consider the spa- 
tial vicinity of the pixels as a feature for setting the weights. Instead, it 
exploits more global features, which leads to improved performance. 

In spite of this success, we have shown that the performance of NLM is 
within a logarithmic factor of the performance of the wavelet thresholding 
and still significantly below the optimal achievable rate. This is due to the 
fact that the isotropic nature of the NLM neighborhoods does not allow the 
algorithm to discriminate the pixels that are close to but below the edge from 
the pixels that are close to but above the edge. This leads to a blurring effect 
that results in high bias along the edge. Exploring the performance of NLM 
with anisotropic neighborhoods may address this issue and is left for future 
research. 
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